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Adaptation often involves the acquisition of a large number of genomic changes which arise as 
mutations in single individuals. In asexual populations, combinations of mutations can fix only 
when they arise in the same lineage, but for populations in which genetic information is exchanged, 
beneficial mutations can arise in different individuals and be combined later. In large populations, 
when the product of the population size N and the total beneficial mutation rate Ui, is large, many 
new beneficial alleles can be segregating in the population simultaneously. We calculate the rate of 
adaptation, v, in several models of such sexual populations and show that v is linear in NUt, only 
in sufficiently small populations. In large populations, v increases much more slowly as log NUt- 
The prefactor of this logarithm, however, increases as the square of the recombination rate. This 
acceleration of adaptation by recombination implies a strong evolutionary advantage of sex. 



In asexual populations, beneficial mutations arising on different genotypes compete against each other and in large 
populations most of the beneficial mutations are lost because they arise on mediocre genetic backgrounds, or acquire 
further beneficial mutations less rapidly than their peers — the combined effects of clonal interference and multiple 
mutations (Desai and Fisher 2007 Gerrish and Lenski 1998). Exchange of genetic material between individuals 
allows the combination of beneficial variants which arose in different lineages, and can thereby speed up the process 
of adaptation (Fisher 1930 Muller 1932). Indeed, most life forms engage in some form of recombination, e.g. 
lateral gene transfer or competence for picking up DNA in bacteria, facultative sexual reproduction in yeast and 
plants, or obligate sexual reproduction in most animals. Some benefits of recombination for the rate of adaptation 
have recently been demonstrated experimentally in C.reinhardtii (Colegrave 2002), E.coli (COOPER 2007), and 
S.cerevisiae (GODDARD et al. 2005), for a review of older experiments see (Rice 2002 1. 

Yet the benefits of sex become less obvious when one considers its disadvantageous effects: recombination can 
separate well adapted combinations of alleles and sexual reproduction is more costly than asexual reproduction due 
to resources spent for mating and, in some cases, the necessity of males. The latter — in animals often termed 
the two-fold cost of sex — implies that sexual populations can be unstable to the invasion of asexual variants. As 
a result, the pros and cons of sex have been the subj ect of many decades of debate in the theoretical literature 
flBARTON} |1995a| |Barton and Charlesworth] |1998 |Crow and Kimura] |1965| |Felsenstein| |1974| |Maynard 



Smith 1968), and several different potentially beneficial aspects of sex have been identified including the pruning of 



detrimental mutations (Peck 1994[ |Rice[ |1 998) and host-parasite coevolution or otherwise changing environments 
Callahan et al.\ |2009| |Charlesworth |1993| |Gandon and Otto] |2007| |Ladle et al.\ |1993 
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eck| |1999^ In the opposite situation of re latively static populations, it has been proposed t hat 



recombination is favored in the presence of negative epistasis (Feldman et al.. 1980 Kondrashov 1984 1988) - a 
situation when the combined detrimental effect of two unfavorable alleles is greater than the sum of the individual 
effects. While this may sometimes be a significant effect, most populations, especially microbes, are likely to be under 
continuing selection and the benefits of sex for speeding up adaptation are likely to dominate. 

The Fisher-Muller hypothesis is that sex speeds up adaptation by combining beneficial variants. Moreover, it has 
been demonstrated by Hill and Robertson ( 1966 ) that linkage decreases the efficacy of selection. This detrimental 
effect of linkage, known as the "Hill-Robertson effect", causes selection for higher recombination rates, which has 



been shown by analyzing recombination modifier alleles at a locus linked to two competing segregating loci ( Barton 
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Iles et al. 


2003 
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Hitchhiking of the allele that increases the recombination rates with the sweeping linked loci results in effective 
selection for increased recombination. 

Experiments and simulation studies suggest that the Hill-Roberston effect is more pronounced and selection for 



recombination modifiers is stronger in large populations with many sweeping loci (COLEGRAVE 2002 Felsenstein 



1974 


Iles et al. 


2003 


limited. IRouzine and 



However, the quantitative understanding of the effect of recombination in large populations is 



HIV finding that recombination of standi ng variation speeds up adaptation by producing anomalously fit indiv iduals 
at the high fitness edge of the distribution ( |Gheorghiu-Svirschevski et aL[|2007||RouziNE and Coffin[|2005| . The 
effects of epistatic interactions between polymorphisms and recombination on the dynamics of selection have recently 
been analyzed by Neher and Shraiman ( |2009[ ). Yet none of these works consider the effects of new beneficial 
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mutations. In the absence of new mutations (and in the absence of heterozygous advantage which can maintain 
polymorphisms) the fitness soon saturates as most alleles become extinct and standing variation disappears. Thus 
the crucial point which must be addressed is the balance between selection and recombination of existing variation and 
the injection of additional variation by new mutations. 

Here, we study the dynamics of continual evolution via new mutations, selection, and recombination using several 
models of recombination. Our primary models most naturally apply when periods of asexual reproduction occur 
between matings, so that they approximate the life style of facultatively outcrossing species such as S. cerevisiae, 
some plants, and C. elegans, which reproduce asexually most of the time but undergo extensive recombination when 
outcrossing. The models enable us to study analytically the explicit dependence of the rate of adaptation and of the 
dynamics of the beneficial alleles on the important parameters such as the outcrossing rate and population size. In an 
independent study Barton and Coe (personal communication) calculate the rate of adaptation for obligate sexual 
organisms using several different multilocus models of recombination, including the free recombination model studied 
here. The relation of our work to theirs, and well as to that of |COHEN et al. (Cohen et al. 



2005 2006 ) who have 



also studied the effects of recombination with multiple new mutations, is commented on in the Discussion section. 

When deleterious mutations can be neglected, the rate of adaptation is the product of the rate of production 
of favorable mutations NUb (N being the population size and Ub the genome wide beneficial mutation rate), the 
magnitude of their effect, and their fixation probability. The fixation probability is dominated by the probability 
that the allele becomes established: i.e. that it rises to high enough numbers in the population that it is very 
unlikely to die out by further stochastic fluctuations. In a homogeneous population a single benefici al mutation with 
selective advantage s has a probability of establishment and eventual fixation of P e s» s» s 1 ( MoRAN 19591. 
In a heterogeneous population, however, a novel beneficial mutation can arise on different genetic backgrounds and 
its establishment probability will thus vary, being greater if it arises in a well adapted individual. But even well 
adapted genotypes soon fall behind due to sweeps of other beneficial mutations and combinations. In order to avoid 
extinction, descendants of the novel mutation thus have move to fitter genetic backgrounds via recombination in 
outcrossing events ( |Rice 2002). As a result the establishment probability decreases as the rate of average fitness 
gain, v, in the population increases. But the rate of average fitness gain, or cquivalcntly, the rate of adaptation itself 
depends on the establishment probability. These two quantities therefore have to be determined self-consistently. 

In this paper we analyze several models via self-consistent calculations of the fixation probability of new mutations. 
For a given production rate of beneficial mutations NUb, we find that interference between mutations is of minor 
importance if the recombination rate r exceeds V '4s 2 NUb- In this regimes, the rate of adaption isww NUbS 2 as found 
for sequential mutations or in the absence of linkage. At recombination rates below yj s 2 NUb/ log NUb, however, v 
grows only logarithmically with log NUb- We find this behavior in all our models and argue that it obtains more 
generally. The prefactor of the log NUb increases with the square of the recombination rate, implying a strong benefit 
of recombination in large populations. 



I. MODELS 

We consider a population of haploid individuals with fitness (growth rate), X, determined by the additive effects of 
a large number of loci each of which makes small contributions to the fitness. We assume selection is weak enough for 
the population dynamics to be described by a continuous time approximation, that the population size, N, is large 
enough that Ns 3> 1, and that a wide spectrum of fitnesses is present, characterized by the fitness variance, a 2 , of 
the population. Individuals divide stochastically with a Poisson rate 1 + X — X(t), where X(t) is the mean fitness 
in the population, and they die, also stochastically, with rate 1 (that is, we use the death rate to set the unit of time 
and assume for convenience that I - 1(f) C 1). In addition to this asexual growth, individuals outcross with rate 
r. Within our models, outcrossing is an independent process decoupled from division (but this does not substantively 
affect our results). 

The primary model of mating that we study is free recombination. In an outcrossing event two randomly chosen 
parents are replaced by two offspring and each parental allele is assigned at random to one or the other of the 
two offspring. This would be exactly correct if all loci were on different chromosomes, and can be a reasonable 
approximation when the number of crossover sites is large so pairs of substantially polymorphic loci are likely to be 
unlinked at each mating. At the end, we discuss briefly what happens when this approximation breaks down. When 
the number of polymorphic loci is large and their contributions to X are of comparable magnitude, the distribution 
of offspring fitness is well described by a Gaussian distributed around the value midway between the fitnesses of the 



In discrete generation models, P e ~ 2s 
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two parents, and with variance cr 2 /2 if loci are uncorrelated (Bulmer 1980): this is less than the a 2 variance of 
the parental population. Note that a 2 is proportional to the number of segregating alleles and represents the extent 
of genetic variation in the adapting population. It is not a fixed parameter of the model, but is to be calculated 
self-consistcntly as a function of the population size and the mutation and out-crossing rates. 

In addition to the free recombination model described above, we study two other models. The first is a grossly 
simplified model of recombination in which a randomly chosen individual is replaced by an individual whose genome is 
assembled by choosing the alleles at each locus according to the allele frequencies in the entire population, independent 



of the "parents" (see also ( |Barton and Coe 2009)). In this case recombinant offspring have fitness distribution 



identical to the population distribution. It turns out that this communal recombination model, even if unrealistic, 
behaves similarly to the free recombination model while being much easier to analyze mathematically: this makes it 
a good source of insight as well as supporting the contention that the form of our results is more general than the 
particular models. 

The free recombination model, and even more so the communal recombination model, overestimate the amount of 
gene reassortment during outcrossing events by assuming that all loci are simultaneously unlinked by recombination 
to the same extent, independent of their locations on the chromosomes. To study the effects of more persistent genetic 
linkage, we also study a third model in which only a single locus is exchanged with a mating partner in an outcrossing 
event, or — equivalently — is picked up from DNA in the environment and randomly replaces the initial allele at the 
same locus. This model is remi nisc e nt of lateral gene transfer among bacteria and related to, but not the same as, 
the model studied by Cohen et al. ( |2005[ ). While this minimal recombination model preserves the linkage of all but 
one locus at a time, each locus is equally strongly linked to all other loci. Thus this model does not approximate the 
position-dependent crossing-over of chromosomes. 

The recombination processes in each of these models are characterized by a rate, r, and a function, K{X, Y, t), 
which is the distribution of offspring fitness Y, given a parent with fitness X mated with a random member of the 
population. Being the distribution of offspring fitness, the recombination 'kernel' is normalized J dYK(X,Y,t) = 1. 
Furthermore, since we ignore epistasis and assume that loci at imtermediate frequencies are in linkage equilibrium, 
recombination leaves the fitness distribution P(X, t) dX of the population invariant J dXK(X, Y, t)P(X, t) — P(Y, t). 
Within the free recombination model, each outcrossing event replaces two parents with two offspring. However, when 
following a rare allele, we can focus on the lineage containing this allele and ignore the fate of the other offspring. 
Matings between two individuals with the same rare allele are very infrequent and can be neglected. Since we are 
interested in the effects of recombination, we will primarily focus on the limit r»s. 



A. Branching process and establishment probability 

The key element determining the rate of adaptation is the probability that a new beneficial mutation avoids 
extinction and establishes in the population. The establishment probability is the probability that the allele survives 
random drift and rises to a sufficiently large number so that its frequency in the population grows dctcrministically 
(and eventually fixates) . This establishment occurs — if it does at all — when the population of the allele is large but 
its frequency in the population is still small. The fate of a new allele during the stochastic phase, when it exists only 
in a small fraction of individuals, can be described well by a branching process which accounts for stochastic birth, 
death, and, crucially, for recombination events that move some of its descendants from one genetic background to 
another. The branching process takes place in a population whose mean fitness is steadily increasing due to beneficial 
mutations sweeping and fixing at other loci and in other lineages. Ignoring the short term effect of mutations, the 
mean fitness, X(t), increases with rate v = — a 2 , where a 2 is the (additive) variance of the fitness. The dynamics 

of a novel beneficial mutation linked to a spectrum of genomic backgrounds in an population adapting with rate v is 
illustrated in figur e [Tj T o establish, its descendents have to switch repeatedly to fitter genomic backgrounds. This 



general idea (see (Rice 2002) for review) applies to the accumulation of beneficial as well as deleterious mutations. 

The establishment probability at a time t — dt of descendants of a genome of fitness X, defined as w(X,t — dt), is 
simply related to that at time t ( BARTON||1995b ): 

w(X, t - dt) =w(X, t) - dt[D + B(X, t) + r]w(X, t) + dtB{X, t)(2w{X, t) - w(X, t) 2 ) 

f (1) 

+ dtr dYK(X,Y,t)w{Y,t) 

where D = 1 is the death rate and B(X) = 1 + X — X(t) the birth rate. After a division, either of the two offspring 
has a probability 1 — w of extinction: hence 2w — w 2 of at least one of these offspring fixing. For a low-frequency 
allele conferring additional fitness s on a genomic background with fitness X, we have B = 1 + X — X(t) + s. 

In a sufficiently large population the adaptation process will proceed in a steady manner leading to a fitness 
distribution of constant width translating towards higher fitness as a "traveling wave" ( Tsimring et al.\ [1996 ) with 
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FIG. 1 A novel mutation needs to recombine onto fitter genetic backgrounds to become established and eventually fix. Panel A: 
The distribution in fitness of the population moves towards higher fitness with velocity v — a 2 . The new mutation, illustrated 
by the black bars, has to switch backgrounds by recombination to keep up with the moving wave of the population fitness 
distribution. Panel B: Initially, the novel mutation is present on a single genetic background with fitness Xo, struggling not to go 
extinct. Recombination can transfer the mutated allele onto a new background, e.g. from Xo to Xi, and spawn a daughter clone 
which starts an independent struggle against extinction. The mutation establishes if at least one branch survives indefinitely. 
The figure shows the complementary case of an unsuccessful mutation: all branches die out. The probability of establishment, 
w(X, t), depends on the fitness X of the genome in which the mutation arose and is a solution to Eq. pj). 



the velocity set by the rate of increase of the mean fitness v = 4rX(t). We make the Ansatz that the distribution of 
fitnesses of the population around its mean X(t) does not fluctuate substantially and that the distribution is close 
to gaussian. These are analogous to "mean-field" approximations which must be justified a posteriori. We expect 
that such approximations will become valid for sufficiently large populations, but how this occurs and how large the 
population must be, is not clear a priori: we discuss this below. 

In the traveling wave population, the establishment probability depends on time only via X(t). Hence we measure 
fitness relative to X{t) = vt, defining x = X — X(t), and seek an otherwise time-independent solution of the form 
w(x) = w(X — vt) = w(x, t). (The properties of w(X, t) and K (X, Y, t) do not change by this shift of variables other 
than becoming time independent relative to a moving reference X(t). We therefore use the same symbols for w(x) 
and K(x,y) in the moving frame.) Using d t w(X — vt) — —vd x w{x), the establishment probability, w{x), then obeys 

vd x w{x) — r J dyK{x, y)w(y) + (x + s — r)w(x) — (1 + x + s)w(x) 2 . (2) 

In many cases of interest, selection is only important on timcscales much longer than the generation time. In that 
case x + s in the prefactor of the quadratic term is negligible compared to the inverse generation time, which is I in 
our units. Eq. ^ then simplifies to 

( vdx -x + r) W ( X )-rjdyK { x,y M y) = «(«)-«(«)», (3) 

We have written this in a suggestive form. The left hand side of Eq. (pi) defines the linear operator J acting on w(y). 
At very high recombination rates, we will obtain that w(x) ~ (1 + 2x/r) which is almost independent of x for x <C r. 
In this limit, the J acting on w(y) vanishes and the population average establishment probability is just the solution 
to the right-hand side, giving simply w(x) ~ s. This is the conventional result (obtained by the simple branching 
process) in the absence of linkage to the rest of the genome. More generally, the fixation probability of a new mutation 
which can arise in any individual is the population average of the x-dependent establishment probability over the 
approximately gaussian distribution of the fitness, x: 

P ^l^ e '^ W{x) (4) 



Equation ^ has an important property. Its left hand side is zero upon averaging with respect to the population 

V2rro 



distribution P{x) — , 1 2 e x l 2a (as is readily confirmed by direct integration using v = a 2 and J dxK{x, y)P(x) 
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P(y), see above). This property originates from the fact that in the deterministic limit (without the additional 
mutation, s), the population dynamics has P{X,t) — P(X — vt) = P(x) as a traveling wave solution (Rouzine and 
Coffin 2005| ) — the initial rationale for assuming a gaussian form. As a consequence, averaging Eq. (|3| yields a 
"solvability condition" 



dx 



(sw{x) — w(x) 2 ) = 



which, when combined with Eq. Q, provides another expression for the establishment probability: 



sP, 



dx 



-x 2 /2tr' 



w(xY 



(5) 



(6) 



This equation together with Eq. ^ describes the "surfing" of a beneficial allele (and far more often its drowning!) 
— the processes illustrated by figure [TJ — under the assumption that the distribution of fitness in the population 
is sufficiently close to gaussian. The latter holds when the large number of alleles at different loci are only weakly 
correlated: we justify this Ansatz below. 



B. Models of recombination 

The recombination kernel K{x, y) depends on the recombination model. For the free recombination model, the 
fitness of the offspring resulting from a mating of two parents with fitness x and z is again Gaussian distributed with 
mean [x + z)/2 and variance a 2 / 2. Averaging over the fitness z of the mate, which is Gaussian distributed with 



= Vo— a e 3 " 2 ■ ( 7 ) 
V 6ira z 

In the communal recombination model, the fitness of the recombinant is a random sample from the population 
(assuming gaussianity and linkage equilibrium). In that case, we have 

i.e. the recombination kernel becomes independent of x and equation Eq. ^ becomes mathematically much simpler. 

Within the minimal recombination model, the probability per unit time of any particular locus being transferred is 
r and the sections are assumed small enough that they contain at most one segregating locus. From the point of view 
of a single mutant, there are two processes: either it can be transfered to another genome, which is effectively like 
the recombination process in the communal recombination model, or other sections can be transfered into its genome 
gradually changing its fitness. With small sections transfered the fitness of the genome undergoes a random walk with 
bias towards the average fitness. The corresponding recombination operator is then 

f . . . . f dy y 2 2 d 2 w dw. , . 

r J dyK(x,y)w{y) = r J ^_ _ e ^w(y) + r[a - x—\ . (9) 

This form of the recombination operator is derived in the Appendix [Cj Note that for the minimal recombination 
model the recombination operator acting on P(x) is different from the adjoint operator acting on w(y). 



II. RESULTS 

A. Fixation probability and rate of adaption 

To calculate the rate of adaptation, we solved Eq. (|3| and obtained expressions for the average fixation probability 
P e of a beneficial mutation, which is of the form P e = o~p e (f, s), where s = s/a and r = r/a are the selective advantage 
of the beneficial mutation and the outcrossing rate rescaled by the to-be-determined width of the fitness distribution 
cr. The expression for P e is used later to calculate a 2 in a self-consistent manner. The derivation of the expressions 



G 



for P e in the different models are given in the following section. In the limit s r, our primary focus, we find for the 
free recombination model 

er 2 log(cr/ s) c ~ log 2 (cr/s) s <g; r <g; CT 

p e (r/a,s/a)={ ^ ' (10) 

1-4S + ... r>ff 



with c a coefficient 2 . At small r, the fixation probability decreases very rapidly with decreasing r. This stems from the 
fact that mutations in individuals from the high fitness tail of the Gaussian fitness distribution have an exponentially 
greater chance of fixing than those in the bulk. At large r, by contrast, the genetic background on which the mutation 
arises plays only a minor role, since the rate of switching background is larger than the selection differentials. While 
starting out on a fit background gives a mutation a slight advantage, mutations on any background have a significant 
chance of fixing. For large r, the result for P e is therefore given by small perturbations of the result without background 
interference: P e ~ s. 

The expressions for P e presented above depend on the variance in fitness a 2 . In an evolving population the variance 
is not a free parameter. When the effects of mutation on the bulk of the fitness distribution can be neglected, as they 
can here, the variance is equal to the rate of adaptation, v. The rate of adaptation, in turn, is given by product of 
the rate at which beneficial mutations enter the population NUb, the magnitude of their effect s and their probability 
of fixation. 

v = NU b sap e (r/a, s/a) = a 2 (11) 

The rate of adaptation, v, can therefore be obtained by solving self-consistently for a in the above equations. Sub- 
stituting our result for P e and ignoring logarithmic factors in the arguments of large logarithms, we find, for the free 
recombination model, 

' 2s2 {^ 2 ^T s K< NU b / log NU b 

/ 2 x 2 ( 12 ) 

NU b s 2 ( 1 - + . . . ) 4>4AT[/ 6 



Contrary to intuition, v is proportional to log NUb rather than NUb both for low r at fixed NUb ^> 1, and at fixed 
r for sufficiently large populations sizes, N. This indicates that the interplay between mutations — especially their 
collective effects on fluctuations — is limiting the rate of adaptation (Gillespie |2001[). As in the asexual case, 



because of interference between mutations, only a small fraction ~ \og(NUb)/NUb of the beneficial mutations fix 
the rest are wasted. However, this fraction increases with increasing rate of recombination leading to v increasing 
as ~ r 2 log NUb, until it saturates at NUbS 2 , which is the limit of independently fixating mutations. In this high 
recombination limit, the rate of adaptation is limited simply by the supply of beneficial mutations NUb- Very similar 
results for the dependence of v on r and N are obtained for the communal recombination model, differing only by 
coefficients inside logarithms and by correction terms. 

In the minimal recombination model, for which only one locus is exchanged at a time, the behavior is slightly 
different. For the fixation probability, we find 

P e ~ e-° 2 /2r 2 + s*i/r 3 (13) 

In contrast to the other models for which recombination results in a macroscopic change of the genotype, the minimal 
recombination model only changes one locus at a time. This results in a slightly weaker dependence of P e on the 
recombination rate for r»s. Self-consisting the fitness variance as before determines the speed of adaptation to be 

v fa 2r 2 log(JVZ7 6 )(l + 2s/r) . (14) 

Surprisingly, this result in essentially independent of s for r»s: the larger increase in the fitness per sweep is almost 
perfectly canceled by the decrease in establishment probability. Note that this model is defined with recombination 
rate r per locus so that the total number of recombinations in time 1/r is far more than in the other models. But the 
time for turnover of the genome and loss of linkage is of order 1 jr and thus r is the useful quantity to compare with 
the other models. 



2 Note that in the limit of very small s, s < exp(— cr 2 /cr 2 ), the expressions break down. This is unlikely to be relevant in practice. 
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B. Simulations 

In writing down Eq. ([3| for the establishment probability of a beneficial mutation, we have assumed that the 
distribution of fitness in the population is gaussian and that correlations and fluctuations are negligible. Thus it is 
useful to compare the analytic results to individual-based simulations of an evolving population. In our simulations, 
we use a discrete generation scheme, where each individual produces a Poisson distributed number of gametes with 
parameter exp(X — X + a) . The population size, N, is kept approximately constant with an average of N by adjusting 
the overall rate of replication through a = (1 — N /N)\og2. Each individual is represented by a string of integers, 
where each bit represents one locus. Recombination, approximating the free recombination model, is implemented as 
follows: Each generation, gametes are randomly placed into a pool of asexual gametes with probability 1 — r and into 
a pool of sexual gametes with probability r. The asexual gametes are placed unchanged into the next generation. 
The sexual gametes are paired at random and their genes reassorted to produce haploid offspring. Whenever one 
locus becomes monomorphic — via fixation or extinction of an allele — , one individual is chosen at random and 
a mutation introduced at that specific locus. This allows us to make optimal use of the computational resources 
by keeping as many polymorphic loci as possible. However, this scheme renders the beneficial mutation rate, t/&, a 
dependent quantity which, as shown in Fig. [2] increases with L and decreases with r. The effective total rate for new 
beneficial mutations, NUb, can be determined simply by measuring the average rate at which the new mutations are 
introduced (which, the way the simulations are done, is the sum of the extinction and fixation rates). 

Figure [2] shows the mean establishment probability as a function of the outcrossing rate r, for different values of 
L which is roughly proportional to NUb (see above). The establishment probability is small at small r but increases 
sharply and saturates at high r at P e = 2s — the usual single- locus result. The upturn of P e occurs at larger r for 
larger NUb, hi accord with the prediction that the high recombination limit is reached when r substantially exceeds 
(7. The agreement between the analytic predictions in the gaussian Ansatz (via numerical solution of Equation 3| 
and the simulation improves as NUb increases, suggesting that, as we expect, the approximations used become valid 
for large populations. Note, however, that the corrections to the asymptotic results are quite large as the basic 
small parameter of the gaussian Ansatz is inversely proportional to log(NUb). The right panel of Figure [2] shows 
w(x), i.e. the establishment probability of a mutation arising on background x, measured in simulations together 
with the predictions obtained from numerical solution of Eq. pH). At outcrossing rates much larger than a, the 
fixation probability increases only slightly with the background fitness and all new mutations have a substantial 
chance — of order s — to establish. With decreasing r/er, the establishment probability becomes a steeper function 
of the background fitness and only those mutations arising on high fitness backgrounds have a significant chance of 
establishment. Note that at r/er w 1, w(x) measured in simulations decays less rapidly at small x than the solution of 
Eq. ( 
whic. 



. These deviations are probably due to fluctuations of the high fitness edge and the width of the distribution 
1 are ignored in the analysis. However, as discussed below, such fluctuations decrease with increasing NUb as 



long as r > s. 



III. ANALYSIS OF ESTABLISHMENT PROBABILITY 



We now turn to a derivation of the results given for the establishment probability in Eqs. (10 1 and (13 1, which 
requires solving Eq. Q. We first study the case of s <C r <C er applicable, as we shall see, for very large populations. 
We proceed by analysing Eq. Q in different regimes of x. At large positive x — r a, the equation reduces to 
[x — r)w(x) w w 2 (x) with solution w > (x) sa x — r, as illustrated in figure|3] In this regime, w(x) is independent of the 
recombination model and is simply given by the establishment probability of a mutation in the absence of any gains 
from recombination (but with the clonal growth rate reduced by r due to recombination). Establishment is driven 
by clonal expansion and contributions from recombination are negligible. (But we shall sec that there are almost no 
individuals in the population with such high fitness.) In the opposite regime, at large negative x, w(x) is small and 
the quadratic term, as well as the perturbation sw(x) can be neglected. The resulting linear equation for w < (x) valid 
for small x is 

(vd x - x + r)w < (x) - r J dyK(x,y)w < (y) = 0. (15) 

In this regime, the solution depends sensitively on the recombination model. This is intuitive, since the only — and 
very unlikely — way for a mutation at x <C to fix is to recombine onto better backgrounds. We will verify below 
for each model separately, that the crossover from the linear regime, w < (x), to the saturated behavior at large x, 
w > (x), occurs rather sharply around x/cr = ^> 1. At intermediate a < x < crO, the establishment probability 
w < (x) increases steeply (while remaining small enough for the quadratic term to remain negligible) . Individuals in 
this intermediate regime are much fitter than the average individual so that recombination usually leads to less fit 




FIG. 2 Fixation probabilities in recombining populations. Panel A shows the mean fixation probability normalized to the 
value in the high recombination limit as a function of r for three different genome sizes L (with s = 0.002, N = 20000). The 
effective rate of beneficial mutations NUb is shown in the inset (see main text). The scaled fixation probability in the simulation 
(solid lines) is calculated as v/2NUbS 2 and compared to the analytic results for the scaled establishment probability P e (r, &)/s 
(dashed lines). The latter are obtained through numerical solution of Eq. ^ using a 2 observed in simulations. The agreement 
between simulations and the analytic approximation improves with increasing L, i.e. increasing NUb, as expected. Panel B: 
The scaled fixation probability as a function of the rescaled background fitness x/a (relative to the mean). The solid lines are 
simulation results for w(x) divided by 2s using L — 6400 and r = 0.5f2, 0.128, 0.064 and 0.032: the corresponding values 
of the key ratio r/a, which determines the shape of w(x), are indicated in the figure. The dashed lines are predictions for 
w(x)/s obtained via numerical solutions of Eq. |3}. Note that the simulation data becomes noisy when the frequency of x in 
the population is around 1/N. 



offspring. He nce the recombination ter m is of secondary importance in this range and w < (x) is governed by the first 

term in Eq. (15 1. The solution to Eq. (15) is therefore of the form w < (x) — 4>{x)e^ x ^ r ^ / 2<T , where <f>(x) is a slowly 
varying function that depends on the recombination model. This behavior can be interpreted in terms of the dynamics 
of a genotype with initial fitness x. The genotype will expand clonally with rate x — r, giving rise to approximately 
n x ~ e (x~r)t~vt /2 unrecom bi ne cl descendants after t generations. Since each of these could give rise to a lineage 
which will fix, in this regime w{x) is proportional to J n x (t)dt, which increases rapidly with x. This is valid up to 
just below the crossover where the quadratic term, w(x) 2 , starts to be important, see fig. [3j 



Note that the amplitude of w < (x) is left undetermined by the homogeneous linear equation (151 and hence the 
location 8 of the crossover is not fixed. To insure that w(x) solves the complete Eq. ([3]), we need to impose the 
"solvability condition" Eq. ^ as an additional constraint. The solvability condition involves the first and second 
moment of w(x) with respect to the fitness distribution P(x). The first moment is dominated by small and intermediate 
x since P{x)w{x) decreases with x. The second moment, however, is dominated by a narrow range of width ~ er/0 -C a 
around the crossover point cO: for x ~ aO, P{x)w < {x) 2 increases rapidly with x, while P(x)w > (x) 2 decreases rapidly. 
The "solvability condition" (|5| then becomes 



SP„ 



-0 2 /2 



2tt 



(16) 



giving us a relation between P e and 0. To analyze the behavior of the various models it is convenient to rescale the 
rates and fixation probabilities as 



X = x/a, f = r/a, s = s/a, and w(x) = w (x a )/ cr 



(17) 



Utilizing the transform, 



-oo V27T 



(18) 



turns out to be informative: note that the scaled fixation probability is p e = P e /a = f2(z = 0). By integrating the 
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-4(7 -2(j 2cr 4a 6cr 

X 

FIG. 3 Asymptotics of the establishment probability. The fitness distribution P(x) of the population is shown in black, a 
sketch of the establishment probability, w(x), is shown in red for r < a. At low x, w(x) is small and depends sensitively on 

the recombination model, at intermediate a < x < aO, w(x) increases sharply as ~ e 2<t2 , modulated by a slowly varying 
function <j>(x) which depends on the recombination model. At still larger x, beyond crO, the quadratic term in Eq. (J3j) becomes 
important, forcing w(x) to saturate at x — r. The width of the crossover region is of the order of cr/Q. 



rescaled Eq. (3) over the kernel y=e 



2 , we obtain an equation for of the form 



cn = 



-^=e — J w( X ) = s!l(z) - 

oo y Lit 



d\ _ (a 



'2tt 



'MX? 



(19) 



which defines for each model a linear operator C acting on Q(z) {3 is the linear operator defined by the left hand side 
of Eq. [3])). The integral over w(x) 2 is again dominated by the crossover region and can be evaluated using w(Q) w Q 
and the (scaled) crossover width ~ Q^ 1 



e 



2tt 



,62-272 



(20) 



The last step was obtained by substituting Eq. (16). The condition that the solution w<(x) joins smoothly to the 
saturated solution iD>(x) and hence only grows slowly for large %, translates into the condition that Q(z) does not 
diverge at any fixed z: it should be an analytic function of z. We now examine separately the different models, 
simplest first. 



A. Communal recombination model. 

In the communal recombination model, the genotypes of offspring are independent of their parental fitness, which 
makes this model particularly simple. It can, in fact, be solved exactly, as shown in Appendix |Aj or, in the regimes 
of interest, by matched asymptotic expansions. But it is more instructive to proceed with the approximate but more 
general and asymptotically exact analysis outlined above. The equation for f2(z) reads 

£ c (l = (f - z)Q(z) - rp e = Sn(z) - m(0)e ze - z2/2 (21) 

which can be solved trivially. But in general it has a pole at z = f — s. This pole has to be canceled, since we know 
that w(x) saturates at x — @ an d Q(z) cannot develop a singularity. Hence, we must have e e ( r ~ s )~( r ~ s ) I 2 = r/s to 



eliminate the pole. Solving for and substituting it into the solvability condition (16 1 yields 



log(f /a) - (i°g(r/»)+ _ log(f/g) c - '°^(;/') (22) 
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The last approximate equality is correct to leading order in s/f <C 1. 



B. Free recombination model 

In the free recombination model, the offspring obtains on average half of its genome from either parent. The parent 
carrying the new allele mates with a random member of the population: thus after recombination the average fitness 
of the genotype carrying the new allele is half as far from the population mean fitness as it was before recombination. 
As a result of this correlation between parents and offspring, the operator Ci for the free recombination model is 
more complicated and couples Q(z) to Q(z/2). 

c x n = (f - z)n(z) - m(z/2) « m(z) - m{o) e ze - z2/2 (23) 

where, as before, p e = Q(0). Neglecting the e~ z I" 1 « 1 on the right hand side (we need only consider z <C 1 since 
f <C 1), we can analyze this as a power series in z writing Q(z) = £l n z n finding 

^ = fr 1 ~sT-T\ 1 (24) 

tt f-s-r2~ k ^ ?! 11 f - s - r2~ k ' ^ ' 

u k=l j = l J k=j 

As the first part would yield ratios of successive terms which approach l/(f— s) for large n and again induce a pole 
at z = r — s, this has to be canceled by the second inhomogeneous term. The condition for convergence (up to well 
beyond the "almost-pole" at f — s) is that f2 n (f — s) n — > for n —> oo which requires that 

oo j — 1 ~ oo 

1 = *"£ 7T 11^ " ~ s ~ - 11(1 " 2^). (25) 

j=l k=l k=l 

The last approximate equality is accurate when s f and hence 0(f — s) 3> 1. Thus we must have 

e » log(cr "/ g) (26) 

f — s 

with the order-unity coefficient c = 1/ IlfcLi(l — 2 _fc ). We thus obtain p e very similar to the communal recombination 
model, 

„ _ l0g(cf/5) ze - 1 og^( c f/ s -)/2(f-s) 2 _ (2?) 



s(f — S)v2tt 

Note that Q(z) is approximately the Laplace transform of 4>{x) = w(x)e~ x which can be analyzed perturbatively 
for small f, see Appendix [B] This expansion in f reveals the most probable — least unlikely — path of a mutation 
on a typical initial background to successively better backgrounds and establishment. 



C. Minimal recombination model 

The minimal recombination model can be analyzed similarly: Ct is now a differential operator, and we have 

C T £l = (f - z)fi - rpe + fz^- wsfl- m(0)e ze . (28) 

This can be explicitly integrated and the behavior for 1 3> z > 0(r) found to involve linear combinations of e z / r and 
e ze . For s <C f, the condition that the solution matches correctly onto the non-linearly saturated form for \ ~ ©j 
can be shown to be that these two exponentials are almost the same. This yields the condition 9 s» 1/f. In contrast 
to the other models, s only gives corrections to 0. The fixation probability is then found to be 

P e ~ e -i/2r 2 +,7- 3 (29) 
which yields a different form for the speed of evolution: 



v w 2f 2 log(iV/j)(l + 25/f) 



(30) 
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D. High recombination rates 

In the limit of high recombination rate, the crossover to the saturated solution w < (x) occurs far out in the "nose" 
(high fitness tail) of the population distribution — further out than any individuals are likely to be. In this regimes, 
the assumption that J d\e~ x ^ 2 w(x) 2 is dominated by the crossover region is no longer justified. 

To analyse this high r regime, we can make use of the expansion of Q(z) = ^ n z n VL ni which is equivalent to 
expanding w(x) in Hermite polynomials w(x) = J2 n ^nH n (x), where the H n (x) = (— l)™e x2 / 2 <9™e _x2 / 2 . In the limit 



of r 3> s, the second term in Eq. (24) can be neglected for the first few coefficients and we have f2„/J7o = 11™= 1 f (1-2-') 
(for the communal recombination model we have f2 n /f2o = f~ n ). The value of Slo = Pe has to be determined by the 
solvability condition sp e = J dx/V2ne~ x / 2 w(x) 2 - From the orthogonality of the Hermite polynomials one finds that 
the right hand side is simply ^ n nlQ^. Hence, we find for the fixation probability the formal expression 

pe = ^ = s (1 + x: n\ n f2(1 _Y t)2 ) (3i) 

The n! would cause the sum to diverge if it extended to infinity. But for large f, this is a valid asymptotic series, 
which can be truncated at any finite number of terms. To zeroth order, one finds in both models P e = as = s which 
is simply the result in a homogeneous population. Including the first two non-trivial correction terms, one finds 

P e = s (l — 4f~ 2 + 7~~ 4 + ■ ■ • ) free recombination model (32) 
P e — s (l — f~ 2 — f~ 4 + • • • ) communal recombination model 

[Note that the divergence of the expansion for large n, for which this approach breaks down, is related to the singular 
dependence of p e on 1 jf for small f discussed above.] For the minimal recombination model, the behavior for large r 
is similar and the expansion in inverse powers of f can be analyzed: we do not carry this out here. 



E. Range of validity of analysis 

Throughout the analysis, we have assumed that the fitness distribution of individuals in the population, P(x = 
X — X(t)), is gaussian, and also that of recombinant offspring. Crucially, for the analysis, we assumed that it 
remains gaussian in the high-fitness nose of the distribution all the way to the crossover point which controls the 
establishment probabilities. We need to justify this Ansatz. First, as noted earlier, we observe that a gaussian fitness 
distribution is the exact traveling-wave solution to the linear recombination model in the absence of fluctuations: 
the gaussian approximation should thus be valid throughout the bulk of the distribution in the limit of very large 
populations. Second, in the absence of fluctuations (or epistatic interactions which we are ignoring in any case) 
the frequencies of alleles at different loci are independent. And third, if the establishment probabilities of different 
beneficial mutations are independent, then it can be shown that the resulting Poisson process of the establishments 
together with random combining of the alleles with their corresponding frequencies leads to a distribution of fitnesses 
whose logarithm averaged over the establishment times, (log(P(a;))), is exactly parabolic — corresponding to a 
gaussian distribution. However, due to fluctuations and correlations, the distribution of fitnesses will be neither 
exactly gaussian nor exactly time-independent and we must check that the non-fluctuating gaussian is a good enough 
approximation far enough out in the nose in the large N regimes of interest. 

We first check that the sampling of the distribution due to the finite population size is sufficient. A population 
of size N samples a close-to-gaussian distribution only out to about log A ahead of the mean. But this implies 
that, with the fitnesses of individuals only weakly correlated, the crossover region near is indeed well sampled by 
the population since 

9 « a l0gCr / S = J2log NU b < V2logA . (33) 
r 

The last inequality is valid when the rate of beneficial mutations per genome per generation, £4, is small as is surely 
always the case: there are then of order 1/C4 individuals in the population with fitnesses in the crucial crossover 
region of the establishment probabilities. Furthermore, the Gaussian shape of the fitness distribution will be a 
good approximation when the number of polymorphic loci that contribute substantially to the fitness variance is 
large. However, the total number of established polymorphic loci is dominated by low frequency alleles. (The total 
number of polymorphic loci is much higher still, but almost all of these are not established and destined to soon go 
extinct.) Nevertheless, there are sufficiently many polymorphic sites with high enough frequencies that they contribute 
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substantially to the fitness distribution. Since sweeps occur at rate v/s and since a sweeping allele is at intermediate 
frequencies for a few times 1/s generations, the number of loci, K, contributing substantially to the variance is of 
order v/s 2 ~ (r / s) 2 \og{NUb). For r 3> s these K loci are approximately in linkage equilibrium, giving rise to a 
gaussian fitness distribution with corrections to parabolic \og(P(x)) of order (x/a) 2 /K. At the crossover point, cO, 
it can then be checked that the corrections to P(x) are small as long as r > sylog NUb- We thus expect that this is 
the condition for validity of the gaussian Ansatz from which our analytic predictions are obtained. A more detailed 
analysis of the effects of fluctuations, in particular in the crucial "nose" of the distribution, is left for future work. 



IV. DISCUSSION 



We have analyzed in several simple models the dependence of the speed of adaptation on the rate of recombina- 
tion and the population size, focusing on the particularly interesting behavior in the wide range of outcrossing rates 

syflog NUt "C T < s^J NUb/ log NUb, or equivalently, on population sizes NUb ^ T"log(r/s). In the high recom- 
bination limit and moderate N the conventional analysis of independent fixations holds and the rate of adaptation 
(and concomitantly the variance of fitness) are proportional to the total production rate of beneficial mutations, 
NUb- In contrast, for large populations (with recombination rates in the intermediate regime) we find adaptation 
rate v ~ r 2 log NUb- This change from linear to logarithmic dependence on NUb indicates that the rate of adaptation 
is limited by interference among multiple simultaneously segregating beneficial mutations rather than by the supply 
of beneficial mutations. This reduction in the rate of adaptation due to linkage is, qualitatively, the Hill-Robertson 
effect (Hill and Robertson 1966). Most interestingly, while logarithmic in population size, the rate of adapta tion 
increases with the rate of recombination as r 2 . Hence our results confirm the heuristic arguments by Fisher and 



Muller and provide a quantitative framework for identifying conditions favoring sexual reproduction ( Barton and 



Charlesworth 1998 Rice 2002) 



The rate of adaptation is determined by the dynamics of the linkage between new beneficial alleles and the spectrum 
of fitnesses of the rest of the genome. This results in most new mutations being eliminated by their linkage to modestly 
fit genomes which rapidly lose out with respect to the steadily increasing average fitness driven by the anomalously fit 
genomes. Only those alleles that either arise on very fit genomes or are lucky enough to recombine to make a very fit 
genome will survive long enough for their frequency to grow deterministically and sweep through the population. The 
logarithmic dependence on population size is similar to that found for purely asexual evolution when multiple beneficial 
mutations are present in the population (Desai and Fisher 20071. But with r > s, recombination speeds up the 



adaptation by allowing new mutations that arise on modestly fit backgrounds to recombine to very fit backgrounds 
and thereby fix. 

We have shown that the typical number of simultaneously segregating alleles at intermediate frequencies is on the 
order of K ~ r 2 / s 2 log NUb- For r»s, the number of possible combinations of these sweeping loci therefore dramat- 
ically exceeds the population size. This implies that the limit of "infinite" population size, for which each genotype is 
well-sampled is unattainable at fixed recombination and beneficial mutation rate. On the contrary, sampling becomes 
sparser and the benefits of recombination more pronounced in larger populations. The population size dependence 



of the beneficial effects of recombination has been a subject of considerable theoretical debate (Barton and Otto 



2005 Crow and Kimura 1965 Maynard Smith 1968) 



Iles et al. 



po pulation has been de monstrated in model simulations by 

by COLEGRAVE (2002), who studied this phenomenon in an evolution experiment with 



The increased advantage of sexual reproduction in large 
It has also been observed experimentally 
C. reinhardtii. 



p003| ). 



A. Relationship to other recent work 



The description of the spread of beneficial alleles in space as a traveling wave goes back to Fisher ( 1930 ) . The 



notion that adaptation of a panmictic population can be described as a travelling wave in fitness was introduced by 
Kepler and Perelson ( 1995| and |Tsimring et al. (1996). In these effectively deterministic models, the velocity 
of the pulse is determined by the size of the population through a modification of the deterministic soluti on at the 
high fitness edge — the "nose" or "front" — to approximate the crucial stochastic behavior near the nose (Brunet 



and Derrida 19971 



These concepts were applied to recombining populations by Rouzine and Coffin (2005) 
and |GHEORGHiU-SviRSCHEVSKi et al. (120071 ) who studied the rate of (transient) adaptation when selection acts on 
standing variation. Cohen et al. (2005 [2006) studied continuing evolution with a large supply of beneficial mutations 
available in a model that is related to our "minimal recombination" model. Both approaches focused on the overall 
distribution of fitnesses within the population and the primary role of recombination they considered was to maintain a 
near gaussian shape of the fitness distribution, achieved by producing higher fitness individuals and thereby advancing 
the nose. Some of the results of the approximate analytic treatments are related to ours, including the log N scaling 
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of the adaptation speed in certain regimes. Yet the actual underlying dynamics implicit in the approximations used 
are very different from what we find here and so is the dependence on parameters. 

The key feature of the adaptation with substantial rates of recombination is the stochastic dynamics of new mu- 
tations. The probability that a new beneficial mutation will sweep to fixation is determined by its establishment 
probability: the probability that it escapes stochastic extinction. The establishment probability depends very strongly 
on the distribution of fitnesses of the genetic backgrounds with which the new mutation can be linked. As the dis- 
tribution of fitness depends on the velocity, the steady-state velocity must be determined by matching the rate of 
establishment of new alleles with the velocity of the deterministic traveling wave describing the fitness distribution 
in the population. The latter is driven by the continuous incorporation of a large number of new sweeping alleles 
that have successfully established at earlier times. At any time there is thus a broad distribution of frequencies of the 
beneficial alleles. The primary problem with the earlier analysis is that the distribution and dynamics of individual 
allele frequencies is not treated directly and the approximations implicitly made for their forms are not consistent 
with the basic processes. 

In contrast with the asexual traveling wave for which a description in terms of a simple traveling wave is valid 
(IDesai and FisherI 120071 IRouziNE et all 120081) and the diversity within the population can be ignored, with any 



amount of recombination, the diversity and distribution of allele frequencies is absolutely crucial. It matters a great 
deal whether the advance of the fitness wave occurs via small amounts of each of several new alleles, or all from a 
single allele. This information is lost by treatments in terms of the fitness distribution alone. Note that in general 
this is also true for adaptation from standing variation: beneficial alleles initially at low frequencies can be driven 
extinct by their linkage to different backgrounds. If all are initially at sufficiently high frequencies to avoid this fate, 
then neither linkage nor recombination play much role in the dynamics of the adaptation. 

The models we have studied were inspired by facultatively mating organisms, in which outcrossing occurs at rate 
Barton and Coe| (pers. comm.) have recently performed a related analysis for obligate sexual reproduction. In 



addition to a model with a linear genetic map (see below), they study the free and minimal recombination models, for 
which they find similar logarithmic dependence on the population size and mutation rate. Their discrete generation 
models with obligate mating do not reveal the dependence of the rate of adaptation on the outcrossing rate, one of 
the results of our analysis, but a similar behavior is implicit in their results. 



B. Extensions and open questions 



In this paper we focused on the effect of recombination with r > s in simple models of mating without chromosomal 
organization and without epistasis. We conclude by considering going beyond these simplifying limits. 

We first consider decreasing the recombination rate. In comparing our analytic results on the free recombination 
model with the direct simulations we found good agreement at high recombination rates which confirms the accuracy 
of the simplifying assumptions made in analyzing the model (i.e. Eq. ([3|). At lower recombination rates we observed 
that our "mean-field" treatment of the recombination underestimates the rate of adaptation. This is due to the 
gradual appearance of "fat tails" in the distribution of fitness: specifically, the high fitness nose of the distribution 
decays more slowly than the gaussian assumed in the analysis. The fluctuations in the time of establishment of the 
currently intermediate frequency alleles becomes important. Some of the causes of this can be studied analytically. 



The primary effect is the smaller number of segregating loci — of order v/s 



2 /s 2 



at low recombination rates. 



As the ratio r/s decreases further, the acquisition of further beneficial mutations near the nose of the distribution 
— which dominates the asexual evolution — starts to become important. Correlations between loci caused by this 
process and other sources, will also play important roles. 

The behavior of the leading edge of the fitness distribution is known to be the key factor in determining the speed 
of adaptation in the asexual limit of r — > (Desai and Fisher 2007) and it will be of critical importan ce in the 
r <C s regime. A correct treatment of this regime, connecting with the known results for asexual adaptation (Brunet 



et al. 2008 Desai and Fisher 2007 Rouzine et al. 2008 ) , requires analyzing the diversity that is generated by 
the asexual process and the effects of small amounts of recombination on this. It is worth noting that within our 
approximations, for the low recombination regime with r <C s, the branching process analysis yields an adaptation 



speed 



for 

2 



all 



form 



s log(Nfi)/ log (s/r) which is a similar form to the asexual result, 



three models of the 

v « 2s 2 log(iV y/JIs)/ log 2 (s//i). This suggests that in spite of the breakdown of the assumptions, the approximations 
may give reasonable results, although not asymptotically accurate ones, even for s » r » /i. But we leave this regime, 
which is particularly important for microbes with rare genetic exchange, for future investigations. 

Our analysis has focused on the simple approximation of additive growth rate (equivalent to multiplicative fitnesses 
in a discrete-generation model) . Some of the most interesting extensions of the present models would include epistasis 
— i.e. genetic interactions — which makes the effect of each allele explicitly dependent on its genetic background. This 
dependence can be very complex resulting in low heritability of fitness, in the sense that the fitness of recombinant 
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progeny may be only weakly correlated with the fitness of the parents. Remarkably, in the limit of very strong epistasis 
(Neher and Shraiman 2009) the establishment probability of an allele is described by a model which reduces to 
the communal recombination model described above. The speed of adaptation is, however, determined by a different 
self-consistency condition which will be presented elsewhere. In general, how to setup — never mind analyze! - 
instructive models of evolutionary dynamics with epistasis between many segregating loci, is largely an open field. 

Another important simplification in the free recombination model studied here is the random reassortment of the 
parental alleles ignoring the physical arrangement of the genes. More realistic models would account for the linear 
arrangement of genes on the chromosomes such that chromosomal proximity implies low recombination rate. In this 
case, the number of independently transmitted loci in the event of mating is the product of the number of chromosomes 
and the crossovers per chromosome. When the number of substantially polymorphic loci is sufficiently large, the free 
recombination approximation will certainly break down. But in facultatively mating organisms where periods of 
asexual reproduction are interspersed by outcrossing events much reassortment can occur. Indeed, some facultative 
outcrossers have high crossover rates (e.g. S.cerevisiae (Mancera et al. 2008[ )). In this case the free recombination 



model can have a reasonable regime of validity. More generally, the fact that our three rather different models 
yield similar behavior for the adaptation rates at large population sizes suggests that the forms of the dependence 
on parameters — especially speed proportional to \og(NUb) — may be valid much more broadly. Arguments to be 
presented elsewhere suggest that the balance between the lengths of linked regions and the number of polymorphic loci 
in them can result in v ~ rs log(NUb) in some regimes. Significant progress in the analysis of the rate of adaptation 
with linear chromosomes has recently been made by |Barton and Coe| They invoke a scaling argument and use a 
perturbative analysis of nearby pairs of segregating loci to derive an expression for the rate of adaptation. In this 
approximation, the rate of acquisition of beneficial mutations tends to an upper limit independent of the population 
size, selection coefficient, or mutation rate, being solely determined by the map length: in our notation this would be 
equivalent tousi Crs with C a constant. Note that this is similar to the conjecture quoted above but without the 
log(NUb) factor. To check whether the approximations are accurate with many concurrent sweeps it will be necessary 



to go beyond the perturbative analysis of Barton and Coe Furthermore, the interplay between the effectively 



asexual evolution of short regions of the chromosome that are linked for long times, and recombination between and 
within them, needs to be understood and could well change the behavior qualitatively. 

The challenges of understanding evolutionary dynamics in the presence of many beneficial alleles and recombination 
between linear chromosomes, and of understanding the effects of epistatic genetic interactions, provide many important 
open problems. 
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Appendix A: Exact solution of the communal recombination model 

In the communal recombination model the genotype of recombinant offspring is assembled at random from the 
alleles segregating in the population and therefore independent of the fitness of the parents. The equation describing 
the establishment probability, Eq. ([3]), therefore simplifies to 



d x™(x) =rp e + (x + s- r)w(x) - w(xY 



(Al) 



where all rates, the fitness and w(x) have been rescaled by the standard deviation of the fitness distribution, as in 



Eq. jl7[ ). The quadratic term can be removed by substituting w(x) — 9 ^^f > which gives rise to the equation 



dl^ix) ~ rpe4>(x) ~ (X + $ - f)d x ij){x) = 



(A2) 



A second substitution of ip^) = / A (j>{'d) with -d = x + s ~ f maps Eq. (A2) onto the parabolic cylinder equation 

1 d 2 



dim 



rp e 



0(0) = 



(A3) 



The solution with the correct asymptotic behavior is 0f Pc (x) — e 9 ^ 4 U(fp e ~l/2, 0) and has the integral representation 
(IAbramowitz and StegunI (I1964I), formula 19.5.1) 



fape (x) 



(A4) 
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From tp? Pe (x)i we obtain w(x) by taking the log derivative Wf Pl ,(x) = 9 x logiprp e (x)- The asymptotics of Wfp e (x) m 
the different regimes are 



(x) 




X "C f — s 

f — s -C x *C V — 21ogfp e 



(A5) 



X > V-Slogfpe 



r 



as found via the perturbative scheme in the main text. The fixation probability entered Eq. ( Al I as a free parameter 



and has to be fixed such that f —i=e 



. (x) = Pe, which results in a very similar condition for p e as the 



solvability condition of the perturbative scheme used in the main text. 



Appendix B: The low recombination limit of the free recombination model 

In the intermediate regime where the recombination term and the quadratic term in Eq. ^ are both small, the 
fixation probability is of the form w(x) = 4>(x) e< " x+s ~ r ^ where <f)(x) is a slowly varying function compared to the 
gaussian growth term. Ignoring the quadratic term, the equation for (j)(x) reads 



d x <f)( X ) = 2re- x( - f -^ + i( f - s ? 



dn _ (i-(2»-3(f-i») 2 



(Bl) 



Hence, the dominant contribution to the recombination term comes from r\ — 2x — 3(f — s) w 2%. The function <p(x)i 
however, drops to zero rapidly beyond O, implying 4>{x) constant in the interval 0/2 < x < ©■ 
To study the behavior of 4>(x) more systematically, it is useful to rearrange Eq. (23 1 



Q(z) = 



sp e e z ° - fn(z/2) 



(B2) 



where we assumed f >s and z <C 1 such that s in the denominator and e~ z I 2 can be neglected. Assuming small f, 
this equation can be solved iteratively. The two terms on the right, however, have to be matched to cancel the pole 
at z = f, which can be done by adjusting for each order in the iterative solution. Starting with Q(°'(z) — p e , we 
have 



n«(z) = 



sp e e zQl — rp e 



with 0i 



q _ logr/s 



Iterating Eq. (B2|, it is found that = 



z — r 

_ logcfef/j . _ , nfc-1 



(B3) 



with Cfc ~ n n =i i-2-" ' w hieh i s rapidly 



converging to the value of the crossover point found by power series expansion of £l(z) in Eq. (26). The solution to 
/c-th order reads 



fc-i 



(— r) J e 



fi^(z)=s Pe s r 



(-r) fc p e 



r) 



(B4) 



where all poles are canceled by zeros of the numerator. For small z, Q(z) is related to the Laplace transform of the 
function (f>(x) in the variable z — r. 



Q(z) = I dxe ' *' e U 



dxe 



Hx) 



(B5) 



Since 4>{x) is essentially zero for x > © it is useful to change variables to p = — x an d consider the Laplace transform 
on p € [0, oo [: 



00) 



e 2 



dpe 



(B6) 



where we dropped the z 2 and f 2 terms. We can now backtransform Q( h \z) Eq. (B4| into %-space and obtain an 
approximation for (j>(x)- The inverse transform of terms of the form ( s _^ a )n+i is e~ ai - p ~ T ^u{p — r), with u(x) 
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being the Heaviside function. The most important observation is that the delay r = 0(1 — 2 J ) is different for the 
different orders and that higher order terms come in only below a cut-off set by this delay: 

k-l fe-l 

Mx) * J2(- r ) j MpHp + ^~ j -©) = E(- r ) J '/i( Q - xM 02 ~ j - x ) ■ (B7) 

i=o 3=0 

Here, /j(/o) is polynomial in p multiplied by a slowly varying exponential exp(fp) (f <C 1). This behavior of 0(x) (and 
w(x)) has a simple interpretation: For 0/2 J < x < 0/2 :,_1 the least unlikely way for a new mutation initially with 
a background fitness x to fix is to recombine j times each time getting closer to the front at <d beyond which it can 
rise to a high level without further recombination. 

Appendix C: Minimal recombination model 

In the minimal recombination model, the allele at each locus is exchanged for a random allele from the population 
at rate r. Let the locus i of a particular individual be in state = {0, 1} and assume the beneficial variant is present 
in the population at frequency pi . The expected change in fitness upon exchange of locus i is therefore 

(Axi) = s [pi(l - - (1 - pi)si] = s(pi - Si) (CI) 

Similarly, the variance of the increment is given by 

((Ax, - (A Xl )) 2 ) = s 2 ( Pi + s t - 2 Pl s, - ( Pi - s t ) 2 ) = s 2 p l {l- Pl ) , (C2) 

where we have used Si = s?. Assuming each locus undergoes exchange with rate r, the drift and diffusion coefficients 
of the fitness x are given by 

(Ax) = r(X - X(t)) = rx and ((Ax - (Ax)) 2 ) = ra 2 (C3) 

These diffusion and drift processes are represented by the second and third terms of Eq. ^ . The possibility that the 
novel mutation itself is exchanged into a new genome is described by the first term. 
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